Analysis of Audio-video Correlati Australian English
نویسندگان
چکیده
This paper investigates the statistical relationship between acoustic and visual speech features for vowels. We extract such features from our stereo vision AV speech data corpus of Australian English. A principal component analysis is performed to determine which data points of the parameter curve for each feature are the most important ones to represent the shape of each curve. This is followed by a canonical correlation analysis to determine which principal components, and hence which data points of which features, correlate most across the two modalities. Several strong correlations are reported between acoustic and visual features. In particular, F1 and F2 and mouth height were strongly correlated. Knowledge about the correlation of acoustic and visual features can be used to predict the presence of acoustic features from visual features in order to improve the recognition rate of automatic speech recognition systems in environments with acoustic noise.
منابع مشابه
Statistical analysis of the relationship between audio and video speech parameters for Australian English
After decades of research, automatic speech processing has become more and more viable in recent years. Audio-video speech recognition has been shown to improve the recognition rate in noise-degraded environments. However, which audio and video speech parameters to choose for an optimal system and how they are related is still an open research issue. Here we present a number of statistical anal...
متن کاملThe audio-video australian English speech data corpus AVOZES
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first pu...
متن کاملA Detailed Description of the AVOZES Data Corpus
The AVOZES data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the...
متن کاملChildren's acquisition of English onset and coda /l/: articulatory evidence.
PURPOSE The goal of this study was to better understand how and when onset /l/ (leap) and coda /l/ (peel) are acquired by children by examining both the articulations involved and adults' perceptions of the produced segments. METHOD Twenty-five typically developing Australian English-speaking children aged 3;0 (years;months) to 7;11 participated in an elicited imitation task, during which aud...
متن کاملOrganizational Patterns of English Language Teachers’ Repair Practices
Despite the abundance of research on teachers’ repair practices in language classroom interaction, there are not enough conversation analytic studies on repair organization with the focus on the details of interaction in the context of EFL. Drawing on sociocultural and situated learning theories, this study explores the contingent nature of English language teachers’ org...
متن کامل